222 PART 5 Looking for Relationships with Correlation and Regression
Summary statistics for the residuals
If you read about summarizing data in Chapter 9, you know that the distribution
of values from a numerical variable are reported using summary statistics, such as
mean, standard deviation, median, minimum, maximum, and quartiles. Summary
statistics for residuals are what you should expect to find in the residuals section
of your software’s output. Here’s what you see in Figure 16-4 at the top under
Residuals:»
» The minimum and maximum values: These are labeled as Min and Max,
respectively, and represent the two largest residuals, or the two points that lie
farthest away from the least-squares line in either direction. The minimum is
negative, indicating it is below the line, while the positive maximum is above
the line. The minimum is almost 21 mmHg below the line, while the maximum
lies about 17 mmHg above the line.»
» The first and third quartiles: These are labeled IQ and 3Q on the output.
Looking under IQ, which is the first quartile, you can tell that about 25 percent
of the data points (which would be 5 out of 20) lie more than 4.7 mmHg below
the fitted line. For the third quartile results, you see that another 25 percent
lie more than 6.5 mmHg above the fitted line. The remaining 50 percent of the
points lie within those two quartiles.»
» The median: Labeled Median on the output, a median of –3.4 tells you that half
of the residuals, which is 10 of the 20 data points, are less than –3.4, and half are
greater than –3.4. The negative sign means the median lies below the fitted line.
Note: The mean isn’t included in these summary statistics because the mean of
the residuals is always exactly 0 for any kind of regression that includes an
intercept term.
The residual standard error, often called the root-mean-square (RMS) error in regres-
sion output, is a measure of how tightly or loosely the points scatter above or
below the fitted line. You can think of it as the standard deviation (SD) of the resid-
uals, although it’s computed in a slightly different way from the usual SD of a set
of numbers. RMS uses N
2 instead of N
1 in the denominator of the SD
formula. At the bottom of Figure 16-4, Residual standard error is expressed as
9.838 mmHg. You can think of it as another summary statistic for residuals
Graphs of the residuals
Most regression programs will produce different graphs of the residuals if
requested in code. You can use these graphs to assess whether the data meet the
criteria for executing a least-squares straight-line regression. Figure 16-6 shows
two of the more common types of residual graphs. The one on the left is called a
residuals versus fitted graph, and the one on the right is called a normal Q-Q graph.